01 Pandas II: Dates & Plotting


In [1]:
import pandas as pd

Importing Data


In [2]:
df = pd.read_csv('scraped_and_cleand_six.csv')

Looking at summary of the data


In [3]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1169 entries, 0 to 1168
Data columns (total 7 columns):
Unnamed: 0     1169 non-null int64
Company        1169 non-null object
Date           1169 non-null object
Price          1169 non-null float64
Share Total    1169 non-null int64
Type           1169 non-null object
Price_m        1169 non-null float64
dtypes: float64(2), int64(2), object(3)
memory usage: 64.0+ KB

Now lets look at the dates


In [4]:
df['Date'].head()


Out[4]:
0    30.10.2017
1    30.10.2017
2    30.10.2017
3    30.10.2017
4    27.10.2017
Name: Date, dtype: object

In [5]:
pd.to_datetime(df['Date'], format='%d.%m.%Y').head()


Out[5]:
0   2017-10-30
1   2017-10-30
2   2017-10-30
3   2017-10-30
4   2017-10-27
Name: Date, dtype: datetime64[ns]

In [6]:
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y')

In [7]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1169 entries, 0 to 1168
Data columns (total 7 columns):
Unnamed: 0     1169 non-null int64
Company        1169 non-null object
Date           1169 non-null datetime64[ns]
Price          1169 non-null float64
Share Total    1169 non-null int64
Type           1169 non-null object
Price_m        1169 non-null float64
dtypes: datetime64[ns](1), float64(2), int64(2), object(2)
memory usage: 64.0+ KB

Let's plot the transaction counts

First we need to make the date the index


In [8]:
df.index = df['Date']

In [9]:
df.head()


Out[9]:
Unnamed: 0 Company Date Price Share Total Type Price_m
Date
2017-10-30 0 ABB Ltd 2017-10-30 362229.0 14323 Purchase 0.4
2017-10-30 1 ABB Ltd 2017-10-30 304289.0 12032 Purchase 0.3
2017-10-30 2 ABB Ltd 2017-10-30 10060.0 500 Purchase 0.0
2017-10-30 3 ABB Ltd 2017-10-30 10060.0 500 Purchase 0.0
2017-10-27 4 Banque Cantonale Vaudoise 2017-10-27 10620.0 15 Purchase 0.0

In [13]:
df.resample('B')['Price'].mean().head(10)


Out[13]:
Date
2017-01-03    203534.333333
2017-01-04     27428.000000
2017-01-05    158708.000000
2017-01-06     16051.666667
2017-01-09     11175.000000
2017-01-10      9210.000000
2017-01-11     72244.500000
2017-01-12    968264.300000
2017-01-13    206167.000000
2017-01-16     10815.000000
Freq: B, Name: Price, dtype: float64

Lets plot that


In [ ]:
!pip install matplotlib

In [39]:
print(plt.style.available)


['dark_background', 'seaborn-talk', 'seaborn-colorblind', 'seaborn-ticks', 'ggplot', 'seaborn-pastel', 'seaborn-muted', 'bmh', 'seaborn-darkgrid', 'fivethirtyeight', 'seaborn-dark', 'seaborn-white', 'seaborn', 'seaborn-paper', 'seaborn-dark-palette', 'seaborn-poster', 'seaborn-whitegrid', 'seaborn-bright', '_classic_test', 'seaborn-deep', 'classic', 'grayscale', 'seaborn-notebook']

In [46]:
import matplotlib.pyplot as plt
import matplotlib
plt.style.use('fivethirtyeight')
%matplotlib inline

In [47]:
df.resample('W')['Date'].count().plot()


Out[47]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ba394e0>

In [24]:
df.resample('W')['Price'].sum().plot()


Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x114317b70>

In [21]:
df['Company'].value_counts()


Out[21]:
nebag ag                                 57
Chocoladefabriken Lindt & Sprüngli AG    37
Vontobel Holding AG                      36
Credit Suisse Group AG                   30
Altin AG                                 27
LEM Holding SA                           25
Edisun Power Europe AG                   25
VZ Holding AG                            25
Bobst Group SA                           22
ams AG                                   21
SGS SA                                   21
Partners Group Holding AG                19
Schindler Holding AG                     19
Logitech International S.A.              19
u-blox Holding AG                        18
Burkhalter Holding AG                    18
Bossard Holding AG                       16
Leonteq AG                               15
Kühne + Nagel International AG           15
Lonza Group AG                           14
Bâloise Holding AG                       14
Komax Holding AG                         14
Banque Cantonale de Genève               14
Ypsomed Holding AG                       13
Geberit AG                               13
Galenica AG                              13
Sonova Holding AG                        13
Straumann Holding AG                     12
Novartis AG                              12
INFICON Holding AG                       12
                                         ..
Intershop Holding AG                      2
Rieter Holding AG                         1
Emmi AG                                   1
Perrot Duval Holding SA                   1
Actelion Ltd                              1
dorma+kaba Holding AG                     1
Georg Fischer AG                          1
SFS Group AG                              1
VP Bank AG                                1
Arundel AG                                1
Bank Coop AG                              1
Von Roll Holding AG                       1
Cicor Technologies Ltd.                   1
Swiss Prime Site AG                       1
ARYZTA AG                                 1
Schlatter Industries AG                   1
Thurgauer Kantonalbank                    1
Dätwyler Holding AG                       1
Züblin Immobilien Holding AG              1
Alpiq Holding AG                          1
BB Biotech AG                             1
SCHMOLZ+BICKENBACH AG                     1
Evolva Holding SA                         1
Mikron Holding AG                         1
CPH Chemie + Papier Holding AG            1
Orell Füssli Holding AG                   1
Walter Meier AG                           1
Starrag Group Holding AG                  1
Basler Kantonalbank                       1
Warteck Invest AG                         1
Name: Company, Length: 164, dtype: int64

In [27]:
abb = df[df['Company'] == 'ABB Ltd']

In [28]:
abb


Out[28]:
Unnamed: 0 Company Date Price Share Total Type Price_m
Date
2017-10-30 0 ABB Ltd 2017-10-30 362229.0 14323 Purchase 0.4
2017-10-30 1 ABB Ltd 2017-10-30 304289.0 12032 Purchase 0.3
2017-10-30 2 ABB Ltd 2017-10-30 10060.0 500 Purchase 0.0
2017-10-30 3 ABB Ltd 2017-10-30 10060.0 500 Purchase 0.0
2017-06-02 419 ABB Ltd 2017-06-02 148260.0 6000 Sale 0.1
2017-03-15 928 ABB Ltd 2017-03-15 48582.0 2128 Sale 0.0
2017-03-15 929 ABB Ltd 2017-03-15 45993.0 2019 Sale 0.0
2017-03-15 930 ABB Ltd 2017-03-15 22960.0 1007 Sale 0.0
2017-03-15 931 ABB Ltd 2017-03-15 15549.0 679 Sale 0.0
2017-02-27 1003 ABB Ltd 2017-02-27 326076.0 14409 Purchase 0.3
2017-02-27 1004 ABB Ltd 2017-02-27 187037.0 8265 Purchase 0.2

Shall we print this out?


In [31]:
abb.resample('B')['Price']


Out[31]:
Date
2017-10-30    362229.0
2017-10-30    304289.0
2017-10-30     10060.0
2017-10-30     10060.0
2017-06-02    148260.0
2017-03-15     48582.0
2017-03-15     45993.0
2017-03-15     22960.0
2017-03-15     15549.0
2017-02-27    326076.0
2017-02-27    187037.0
Name: Price, dtype: float64

In [32]:
abb.resample('D')['Price'].plot()
plt.savefig('hello.jpg')


/Users/barneyjs/.virtualenvs/master/lib/python3.5/site-packages/matplotlib/axes/_base.py:2917: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=736387.0, right=736387.0
  'left=%s, right=%s') % (left, right))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-278324abacc7> in <module>()
----> 1 abb.resample('D')['Price'].plot()
      2 plt.savefig('hello.jpg')

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/core/groupby.py in __call__(self, *args, **kwargs)
    347             return self.plot(*args, **kwargs)
    348         f.__name__ = 'plot'
--> 349         return self._groupby.apply(f)
    350 
    351     def __getattr__(self, name):

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
    714         # ignore SettingWithCopy here in case the user mutates
    715         with option_context('mode.chained_assignment', None):
--> 716             return self._python_apply_general(f)
    717 
    718     def _python_apply_general(self, f):

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/core/groupby.py in _python_apply_general(self, f)
    718     def _python_apply_general(self, f):
    719         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 720                                                    self.axis)
    721 
    722         return self._wrap_applied_output(

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/core/groupby.py in apply(self, f, data, axis)
   1800             # group might be modified
   1801             group_axes = _get_axes(group)
-> 1802             res = f(group)
   1803             if not _is_indexed_like(res, group_axes):
   1804                 mutated = True

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/core/groupby.py in f(self)
    345     def __call__(self, *args, **kwargs):
    346         def f(self):
--> 347             return self.plot(*args, **kwargs)
    348         f.__name__ = 'plot'
    349         return self._groupby.apply(f)

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/plotting/_core.py in __call__(self, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   2451                            colormap=colormap, table=table, yerr=yerr,
   2452                            xerr=xerr, label=label, secondary_y=secondary_y,
-> 2453                            **kwds)
   2454     __call__.__doc__ = plot_series.__doc__
   2455 

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/plotting/_core.py in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   1892                  yerr=yerr, xerr=xerr,
   1893                  label=label, secondary_y=secondary_y,
-> 1894                  **kwds)
   1895 
   1896 

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)
   1692         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   1693 
-> 1694     plot_obj.generate()
   1695     plot_obj.draw()
   1696     return plot_obj.result

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/plotting/_core.py in generate(self)
    241     def generate(self):
    242         self._args_adjust()
--> 243         self._compute_plot_data()
    244         self._setup_subplots()
    245         self._make_plot()

~/.virtualenvs/master/lib/python3.5/site-packages/pandas/plotting/_core.py in _compute_plot_data(self)
    350         if is_empty:
    351             raise TypeError('Empty {0!r}: no numeric data to '
--> 352                             'plot'.format(numeric_data.__class__.__name__))
    353 
    354         self.data = numeric_data

TypeError: Empty 'DataFrame': no numeric data to plot

In [18]:
!ls


01 Pandas-Übung (Sven).ipynb         P3_GrantExport.csv
02 pandas II, dates & plotting.ipynb geckodriver.log
03 BeautifulSoup Übung.ipynb         hello.pdf
04 Selenium.ipynb                    scraped_and_cleand_six.csv

But what about different shapes?


In [33]:
df.resample('M')['Price'].sum().plot(kind='bar')


Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x10be330f0>

In [34]:
df.resample('M')['Price'].sum().plot(kind='barh')


Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x10c035940>

In [35]:
df.resample('M')['Price'].sum().plot(kind='pie')


Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x10c0c90b8>

In [36]:
df.resample('M')['Price'].sum().plot(kind='pie', radius=0.5, shadow=True)


Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x10c2c6da0>

In [37]:
df.resample('Q')['Price'].sum().plot(kind='pie', radius=0.5, shadow=True)


Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x10c1bd518>

In [38]:
labels = 'Q1', 'Q2', 'Q3', 'Q4'
colors = ['pink', 'grey', 'grey', 'grey']
explode = (0.05, 0.05, 0.05, 0.05)
plt.axis('equal')
df.resample('Q')['Price'].sum().plot(kind='pie', radius=0.5, autopct='%0.0f%%', shadow=False, labels=labels,colors=colors, explode=explode)


Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x10bc459b0>

In [ ]: